Probabilistic Anonymity

نویسندگان

  • Sachin Lodha
  • Dilys Thomas
چکیده

In this age of globalization, organizations need to publish their micro-data owing to legal directives or share it with business associates in order to remain competitive. This puts personal privacy at risk. To surmount this risk, attributes that clearly identify individuals, such as Name, Social Security Number, Driving License Number, are generally removed or replaced by random values. But this may not be enough because such de-identified databases can sometimes be joined with other public databases on attributes such as Gender, Date of Birth, and Zipcode to re-identify individuals who were supposed to remain anonymous. In literature, such an identity-leaking attribute combination is called as a quasi-identifier. It is always critical to be able to recognize quasi-identifiers and to apply to them appropriate protective measures to mitigate the identity disclosure risk posed by join attacks. In this paper, we start out by providing the first formal characterization and a practical technique to identify quasi-identifiers. We show an interesting connection between whether a set of columns forms a quasi-identifier and the number of distinct values assumed by the combination of the columns. We then use this characterization to come up with a probabilistic notion of anonymity. Again we show an interesting connection between the number of distinct values taken by a combination of columns and the anonymity it can offer. This allows us to find an ideal amount of generalization or suppression to apply to different columns in order to achieve probabilistic anonymity. We work through many examples and show that our analysis can be used to make a published database conform to privacy acts like HIPAA. In order to achieve the probabilistic anonymity, we observe that one needs to solve multiple 1-dimensional k-anonymity problems. We propose many efficient and scalable algorithms for achieving 1-dimensional anonymity. Our algorithms are optimal in a sense that they minimally distort data and retain much of its utility.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Anonymity Via Coalgebraic Simulations

There is a growing concern about anonymity and privacy on the Internet, resulting in lots of work on formalization and verification of anonymity. Especially, the importance of probabilistic aspect of anonymity is claimed recently by many authors. Several different notions of “probabilistic anonymity” have been studied so far, but proof methods for such probabilistic notions are not yet elaborat...

متن کامل

Probabilistic Anonymity

The concept of anonymity comes into play in a wide range of situations, varying from voting and anonymous donations to postings on bulletin boards and sending mails. The systems for ensuring anonymity often use random mechanisms which can be described probabilistically, while the agents’ interest in performing the anonymous action may be totally unpredictable, irregular, and hence expressable o...

متن کامل

Probabilistic and nondeterministic aspects of anonymity

The concept of anonymity comes into play in a wide range of situations, varying from voting and anonymous donations to postings on bulletin boards and sending emails. The protocols for ensuring anonymity often use random mechanisms which can be described probabilistically, while the agents’ behavior may be totally unpredictable, irregular, and hence expressible only nondeterministically. Formal...

متن کامل

Anonymity in probabilistic and nondeterministic systems 1 Catuscia Palamidessi

Anonymity means that the identity of the user performing a certain action is maintained secret. The protocols for ensuring anonymity often use random mechanisms which can be described probabilistically. The user, on the other hand, may be selected either nondeterministically or probabilistically. We investigate various notions of anonymity, at different levels of strength, for both the cases of...

متن کامل

Probabilistic and nondeterministic aspects of Anonymity 1

Anonymity means that the identity of the user performing a certain action is maintained secret. The protocols for ensuring anonymity often use random mechanisms which can be described probabilistically. The user, on the other hand, may be selected either nondeterministically or probabilistically. We investigate various notions of anonymity, at different levels of strength, for both the cases of...

متن کامل

Weak Probabilistic Anonymity 1 Yuxin Deng

Anonymity means that the identity of the user performing a certain action is maintained secret. The protocols for ensuring anonymity often use random mechanisms which can be described probabilistically. In this paper we propose a notion of weak probabilistic anonymity, where weak refers to the fact that some amount of probabilistic information may be revealed by the protocol. This information c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007